OCR-Free Document Understanding Transformer

نویسندگان

چکیده

Understanding document images (e.g., invoices) is a core but challenging task since it requires complex functions such as reading text and holistic understanding of the document. Current Visual Document (VDU) methods outsource to off-the-shelf Optical Character Recognition (OCR) engines focus on with OCR outputs. Although OCR-based approaches have shown promising performance, they suffer from 1) high computational costs for using OCR; 2) inflexibility models languages or types documents; 3) error propagation subsequent process. To address these issues, in this paper, we introduce novel OCR-free VDU model named Donut, which stands transformer. As first step research, propose simple architecture (i.e., Transformer) pre-training objective cross-entropy loss). Donut conceptually yet effective. Through extensive experiments analyses, show model, achieves state-of-the-art performances various tasks terms both speed accuracy. In addition, offer synthetic data generator that helps be flexible domains. The code, trained are available at https://github.com/clovaai/donut .

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Beyond OCR: Multi-faceted understanding of handwritten document characteristics

In the previous chapters, we proposed several features for writer identification, historical manuscript dating and localization separately. In this chapter, we present a summarization of the proposed features for different applications by proposing a joint feature distribution (JFD) principle to design novel discriminative features which could be the joint distribution of features on adjacent p...

متن کامل

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2022

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-031-19815-1_29